Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise

نویسندگان

  • Felix Weninger
  • Martin Wöllmer
  • Björn W. Schuller
چکیده

We address the speaker independent automatic recognition of spontaneous speech in highly variable noise by applying semisupervised sparse non-negative matrix factorization (NMF) for speech enhancement coupled with our recently proposed frontend utilizing bottleneck (BN) features generated by a bidirectional Long Short-Term Memory (BLSTM) recurrent neural network. In our evaluation, we unite the noise corpus and evaluation protocol of the 2011 PASCAL CHiME challenge with the Buckeye database, and we demonstrate that the combination of NMF enhancement and BN-BLSTM front-end introduces significant and consistent gains in word accuracy in this highly challenging task at signal-to-noise ratios from -6 to 9 dB.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...

متن کامل

Real-Time Speech Separation by Semi-supervised Nonnegative Matrix Factorization

In this paper, we present an on-line semi-supervised algorithm for real-time separation of speech and background noise. The proposed system is based on Nonnegative Matrix Factorization (NMF), where fixed speech bases are learned from training data whereas the noise components are estimated in real-time on the recent past. Experiments with spontaneous conversational speech and real-life nonstati...

متن کامل

Iterative Weighted Non-smooth Non-negative Matrix Factorization for Face Recognition

Non-negative Matrix Factorization (NMF) is a part-based image representation method. It comes from the intuitive idea that entire face image can be constructed by combining several parts. In this paper, we propose a framework for face recognition by finding localized, part-based representations, denoted “Iterative weighted non-smooth non-negative matrix factorization” (IWNS-NMF). A new cost fun...

متن کامل

The Tum+tut+kul Approach to the 2nd Chime Challenge: Multi-stream Asr Exploiting Blstm Networks and Sparse Nmf

We present our joint contribution to the 2nd CHiME Speech Separation and Recognition Challenge. Our system combines speech enhancement by supervised sparse non-negative matrix factorisation (NMF) with a multi-stream speech recognition system. In addition to a conventional MFCC HMM recogniser, predictions by a bidirectional Long Short-Term Memory recurrent neural network (BLSTM-RNN) and from non...

متن کامل

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for contextsensitive Tandem feature extraction a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012